Sensitive pattern discovery with 'fuzzy' alignments of distantly related proteins
نویسندگان
چکیده
MOTIVATION Evolutionary comparison leads to efficient functional characterisation of hypothetical proteins. Here, our goal is to map specific sequence patterns to putative functional classes. The evolutionary signal stands out most clearly in a maximally diverse set of homologues. This diversity, however, leads to a number of technical difficulties. The targeted patterns-as gleaned from structure comparisons-are too sparse for statistically significant signals of sequence similarity and accurate multiple sequence alignment. RESULTS We address this problem by a fuzzy alignment model, which probabilistically assigns residues to structurally equivalent positions (attributes) of the proteins. We then apply multivariate analysis to the 'attributes x proteins' matrix. The dimensionality of the space is reduced using non-negative matrix factorization. The method is general, fully automatic and works without assumptions about pattern density, minimum support, explicit multiple alignments, phylogenetic trees, etc. We demonstrate the discovery of biologically meaningful patterns in an extremely diverse superfamily related to urease.
منابع مشابه
PASS2 version 4: An update to the database of structure-based sequence alignments of structural domain superfamilies
Accurate structure-based sequence alignments of distantly related proteins are crucial in gaining insight about protein domains that belong to a superfamily. The PASS2 database provides alignments of proteins related at the superfamily level and are characterized by low sequence identity. We thus report an automated, updated version of the superfamily alignment database known as PASS2.4, consis...
متن کاملThe Context-Dependence of Amino Acid Properties
One of the current limitations of using sequence alignments to identify proteins with similar structures is that some proteins with similar structures do not have significant sequence similarity by identity. One way to address this "hidden-homology" problem is to match amino acids based on their chemical and physical properties. However, the amino acid properties overlap, creating orthogonal di...
متن کاملCombining sequence and structure information in protein alignments
For distantly related proteins, alignmentsbased on structural information are more reliable than traditional sequence alignments. However, when structural comparison leaves some ambiguity in alignment, sequence information can provide valuable additional information to discriminate between multiple alternatives. In this paper we present a Bayesianmodel that incorporates sequence information int...
متن کاملStatistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.
Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods...
متن کاملA Space-Efficient Approach towards Distantly Homologous Protein Similarity Searches
Protein similarity searches are a routine job for molecular biologists where a query sequence of amino acids needs to be compared and ranked against an ever-growing database of proteins. All available algorithms in this field can be grouped into two categories – either solving the problem using sequence alignment through dynamic programming, or, employing certain heuristic measures to perform a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 19 Suppl 1 شماره
صفحات -
تاریخ انتشار 2003